Measures of Statistical Complexity: Why? 
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We review several statistical complexity measures pro- 
posed over the last decade and a half as general indicators of 
structure or correlation. Recently, Lopez-Ruiz, Mancini, and 
Calbet [Phys. Lett. A 209 (1995) 321] introduced another 
measure of statistical complexity Clmc that, like others, sat- 
isfies the "boundary conditions" of vanishing in the extreme 
ordered and disordered limits. We examine some properties of 
Clmc and find that it is neither an intensive nor an extensive 
thermodynamic variable and that it vanishes exponentially in 
the thermodynamic limit for all one-dimensional finite-range 
spin systems. We propose a simple alteration of Clmc that 
renders it extensive. However, this remedy results in a quan- 
tity that is a trivial function of the entropy density and hence 
of no use as a measure of structure or memory. We conclude 
by suggesting that a useful "statistical complexity" must not 
only obey the ordered-random boundary conditions of van- 
ishing, it must also be defined in a setting that gives a clear 
interpretation to what structures are quantified. 
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I. STATISTICAL COMPLEXITY MEASURES 

Theoretical physics has long possessed a general mea- 
sure of the uncertainty associated with the behavior of 
a probabilistic process: the Shannon entropy of the un- 
derlying distribution — a quantity originally intro- 
duced by Boltzmann over 100 years ago. In the '50's, 
Kolmogorov and Sinai adapted Shannon's informa- 
tion theory to the study of dynamical systems. This 
work formed the foundation for the statistical character- 
ization of deterministic sources of apparent randomness 
in the late '60's through the early '80's. These efforts to 
describe the unpredictability of dynamical systems were 
largely successful. The metric entropy, Lyapunov expo- 
nents, and fractal dimensions now provide widely appli- 
cable quantities that can be used to detect the presence 
of and to quantify the degree of deterministic chaotic be- 
havior. 

Since that time, though, it has become better appreci- 



ated that measuring the randomness and unpredictabil- 
ity of a system fails to adequately capture the correla- 
tional structure in its behavior. Structure here is taken 
to be a statement about the relationship between a sys- 
tem's components. Roughly speaking, the larger and 
more intricate the "correlations" between the system's 
constituents, the more structured the underlying distri- 
bution. Structure and correlation are not completely in- 
dependent of randomness, however. It is generally agreed 
that both maximally random and perfectly ordered sys- 
tems possess no structure Nevertheless, at a given 
level of randomness away from these extremes, there can 
be an enormously wide range of differently structured 
processes. 

These realizations led to a considerable effort to de- 
velop a general measure that quantifies the degree of 
structure or pattern present in a process (c.f. [p|~p7|). 
There are many ad hoc methods for detecting structure, 
but none are as widely applicable as entropy is for indi- 
cating randomness. The quantities that have been pro- 
posed as general structural measures are often referred 
to as "complexity measures." To reduce confusion, it 
has become convenient to refer to them instead as statis- 
tical complexity measures. In so doing they are immedi- 
ately distinguished from deterministic complexities, such 
as the Kolmogorov-Chaitin complexity ||r^-|20[| which re- 
quires the deterministic accounting of every bit — random 
or not — in an object. In contrast, statistical complexity 
measures discount for randomness and so provide a mea- 
sure of the regularities present in an object above and 
beyond pure randomness. Deterministic complexities are 
dominated by the random components in an object; the 
result is that their average-case growth rate is given by 
the Shannon entropy rate 

A number of approaches to measuring statistical com- 
plexity have been taken. One line of attack oper- 
ates within information theory and examines how the 
Shannon entropy of successively larger subsystems con- 
verges to the entropy density of the entire system 
[^,|[||Jll|,|l^,|lJ] . These quantities can be interpreted as 
the average memory stored in configurations. 

Another set of approaches appeals to computation 
theory's classification of devices that recognize differ- 
ent classes of formal language (a set of strings). Ex- 
amples include finite memory devices (e.g. the finite- 
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state machines) and infinite memory devices (e.g. push- 
down automata and Turing machines) |2]|| . One such 
computation-based measure of statistical complexity is 
the logical depth The logical depth of (say) a sys- 

tem's configuration is the time required for a universal 
Turing machine to run the minimal program that repro- 
duces it. Another example of a computation theoretic ap- 
proach is found in the statistical complexity of refs. ||7|,|l5|, 
a quantity that measures the amount of memory needed, 
on average, to statistically reproduce a given configura- 
tion. Unlike logical depth, which assumes the use of a 
universal Turing machine — the most powerful discrete 
computational model class — this statistical complexity 
assumes that the simplest possible computational class 
is used to describe the configuration. A higher level class 
is used only if lower ones fail to admit a finite description. 
This "bottom-up" definition of hierarchical information 
processing has been successfully applied to the symbolic 
dynamics of chaotic maps , cellular automata , 
spin systems |Q, and hidden Markov models Q. For 
other approaches to statistical complexity see, for exam- 
ple, refs. P,pl,plp7l|2l. 



II. PROPERTIES OF Clmc 

Recently, Lopez-Ruiz, Mancini, and Calbet proposed 
another measure of statistical complexity Clmc [ p6| . 
Consider a discrete random variable Y that can take on 
N values y. We denote by Pr(j/) the probability that the 
variable Y assumes the value y. Ref. then defines a 
complexity measure: 



ClmcM = H[Y]D[Y] 



where H is the Shannon entropy, 

H[Y] = -^Pr(2/)log2 

{y} 



Pr(y) 



(1) 



(2) 



in which the sum runs over all allowed values of y. The 
quantity D is the "disequilibrium," defined by 



D[Y] 



^(Pr(y) 

{y} 
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(3) 



which measures the departure of Pr(y) from uniformity. 

The motivation posited in ref. [p6| for the form of Clmc 
is that it vanishes for distributions that correspond to 
perfect order and maximal randomness. Ref. |26[| argues 
that perfect order corresponds to a vanishing Shannon 
entropy and notes that ioi H = 0, Clmc = 0. Maxi- 
mal randomness occurs for H = logj N, corresponding 
to Pr{y) = 1/N. And so, by eq. (^ D and hence Clmc 
equal zero. Thus, by construction, Clmc vanishes in the 
extreme ordered and disordered limits. 

We now proceed to discuss the behavior of Clmc in 
the thermodynamic limit. Anteneodo and Plastino have 



already reported some of Clmc's properties in this limit 
p7| . However, their line of investigation is rather differ- 
ent from that undertaken here. In ref. the distribu- 
tion that maximizes Clmc is determined. Numerical and 
analytic work there indicate that the maximizing distri- 
bution for TV cx) is one in which a single event has 
probability 2/3 while all others are equally likely. 

As an alternative to looking at the maximizing distri- 
bution, we suggest examining how a system's complex- 
ity changes as parameters — e.g temperature, coupling 
strength, nonlinearity, etc. — are varied. This approach 
is in keeping with statistical mechanics, where one typ- 
ically looks at changes in the behavior of quantities in 
the thermodynamic limit as system parameters are var- 
ied. For example, rather than determine the distribution 
that maximizes the expectation value of the specific heat 
for a finite-sized system, one usually determines how the 
specific heat per site behaves in the limit of an infinite 
system as, say, the temperature is varied. 

Consider the class of probability distributions over dis- 
crete, finite random variables generated by finite-memory 
Markov chains. Let . . . X-2, X-i, Xq, Xi, X2, ■ ■ ■ be a 
bi-infinite chain of random variables where each value 
Xi is chosen from a discrete finite alphabet of size 
k. We denote L consecutive variables by X[ = 
Xi, Xi+i, Xi+2, • ■ • , Xi+L-1- X[ is a system of L vari- 
ables with N — k'^ possible configurations xf. Let 
Pr(xi) be the probability that the z*'' random variable 
takes on the particular value Xi. We denote by Vr{xf) 
the joint probability distribution over L consecutive ran- 
dom variables. We assume a shift symmetry so that 
Pr(a;f ) = Pr(xo ) and subsequently drop the subscript 
i = 0. The chain of discrete random variables may 
be viewed as a translationally invariant spin system or, 
equivalently, as a stationary stochastic process. 

As is well known, the Shannon entropy of a block of 
L such variables typically grows linearly for sufficiently 
large L. In other words, the limit 



lim -if[Xo,Xi, 

L — 'OO Li 



(4) 



exists and, in the thermodynamic (L — > 00) limit, 

H[X^] cx /i^L . (5) 

The quantity is well-defined as the system size goes 
to infinity and is known as the entropy rate, the metric 
entropy, or the thermodynamic entropy density depend- 
ing on the context. In statistical mechanics parlance, 
eq. (H) tells us that the Shannon entropy H is an exten- 
sive quantity, so that it is possible to define a meaningful 
entropy density that characterizes the randomness per 
variable in the system. 

The "disequilibrium" term, eq. (^), is not so well be- 
haved. In fact, under no circumstances does it grow lin- 
early with system size L. One can show, quite generally, 
that for a system of length L described by a probability 
distribution over N = k^ events, D is bounded above by 
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1 — 1/N. To see this, we expand the square in eq. (||) and, 
since the probabihty distribution is normahzed, obtain 



H[X] = ~p\og,p - (l-p)log2(l-p) 



(10) 



D[X^] = Pr(x^)^ - ^ 



(6) 



The sum is understood to run over all possible values 
oiX^. Since Pr(a;^) < 1, it follows that 



{x^} 



{x^} 



= 1 



Thus, 



(7) 



(8) 



and we see that D cannot grow linearly with the system 
size L. Therefore, the "disequilibrium" is not a thermo- 
dynamically extensive quantity. 

In fact, it can be shown that D vanishes exponentially 
with increasing system size for a large class of systems. 
Let our chain of variables . . . X-i,Xq, Xi,X2, ... be cho- 
sen by a one-step Markov process with transition prob- 
abilities given by Tab = P"c{b\a), a,b £ {0, 1, . . . , fc — 1}. 
That is. Tab gives the probability that the variable Xi^i 
takes on the 6— value given that Xi takes on the a— 
value. (What follows applies to any finite-step Markov 
chain, as blocks of adjacent variables can be grouped to 
render the process one-step.) Then, if we assume that 
the Markovian process is regular — i.e., there exists some 
K such that {Tab)^ > for all a and b — then it follows 
that the disequilibrium of a system of L variables goes to 
zero exponentially fast in L. This is proved in appendix 

As a result, Clmc vanishes in the thermodynamic limit 
for all regular Markov chains, a class of systems that in- 
cludes all finite-range, one-dimensional spin systems with 
finite-strength interactions. It seems to us counterintu- 
itive that a (useful) measure of complexity vanishes for 
all of these systems. While these models exhibit no crit- 
ical phenomena, there are considerable changes in the 
structure of the distributions as system parameters are 
varied psf . A measure of complexity, as we envision it, 
should be sensitive to these changes. 

As an illustration of the nonextensivity of D, consider 
the special case where the chain consists of variables 
that are independent and identically distributed (iid); 
i.e., Pi{xi,Xj) — Pr(a::j)Pr(a;j) for all Xi and Xj with 
i,j = ..., —2, —1, 0, 1, 2, . . .. For a spin system, this cor- 
responds to the case where there is no coupling between 
spins — a paramagnet. For convenience we assume that 
Xi can take on two values, (say) and 1, and we denote 
the probability that Xi takes on the value 1 by p. Then, 
using the binomial theorem, we find that Clmc for a 
system of L such variables is given by: 



LH[X] ((l-2p + 2p2) 



2\L 



(9) 



Eq. (||) is rather curious. In our view, the complexity 
of a collection of iid binary variables should vanish re- 
gardless of their number. A set of independent variables 
is statistically very simple — there is manifestly no cor- 
relational structure whatsoever. Furthermore, it seems 
to us that if the complexity doesn't vanish for all such 
systems, it ought to grow linearly as a function of the 
number of variables. That is, six biased coins should be 
twice as complex as three biased coins. Eq. (||) shows 
that the size dependence of Clmc is much more compli- 
cated. 

In ref. |Q it is found that the maximal value of Clmc 
goes to 4/27 as the system size goes to infinity. This is not 
at odds with the exponentially fast vanishing of D (and 
hence Clmc) for regular Markov processes noted here, 
since the system that maximizes Clmc is not Marko- 
vian. To see this, recall that the maximal distribution 



reported in ref. [ 
ity 2/3 while al 



ITj has one configuration with probabil- 
others have equal probability. In the 
thermodynamic limit, this one configuration is infinitely 
long. Thus, the generating process must keep track of 
arbitrarily long sequences in order to assign a distinct 
probability to one and another probability uniformly to 
all others. As a result, the distribution that maximizes 
Clmc cannot be generated by a finite-memory Markov 
process in the thermodynamic limit. 



III. REPAIRING NONEXTENSIVITY 

Up to this point we have seen that Clmc is not suitable 
for use in a statistical mechanics context. In particular, it 
suffers from two related deficiencies: it is not an extensive 
quantity and it vanishes for a large variety of structured 
processes. The trouble causing both of these shortcom- 
ings resides in the "disequilibrium" factor. Perhaps if one 
altered the definition of Clmc so that D was extensive, 
as the Shannon entropy H is, one would obtain a more 
useful measure of statistical complexity. 

To this end, we seek an extensive measure of a distribu- 
tion's departure from uniformity. As we will be multiply- 
ing this measure by the Shannon entropy H which carries 
units of bits, it also seems natural, although not neces- 
sary, to choose a "disequilibrium-like" quantity that also 
carries units of bits. Information theory is armed with 
just such a function: the relative information [Q. 

The relative information, also known as the informa- 
tion gain or the KuUback-Leibler information distance, 
between two distributions Pr(?/) and Pr(2/) is defined by 



D{Fr{y)\\Piiy)) 



^Pr(y)log2^. (11) 
Pr(y) 



{y} 



where H[X] is the binary entropy function: 



The relative information is not a true distance function — 
neither satisfying the triangle inequality nor being sym- 
metric. Nevertheless, it does provide a measure of how 
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much two distributions differ and it does carry the same 
units (bits) as the Shannon entropy. It is also an exten- 
sive quantity, since it grows hnearly with the number of 
variables in the distribution's support. 

So, D{Pr{y) \\ Pr(?/) ) where Pr{y) — 1/N provides an 
extensive measure of Pr(j/)'s departure from uniformity 
in units of bits. Using this in eq. (^), we define a modified 
statistical complexity measure: 



C'[Y] EE H[Y]DiPiiy)\\l/N) 



(12) 



which has units of [bits^]. From here on we will focus on 
C", the modified Clmc- Note that much of C"s character 
is shared by Clmc- 

To see how this new quantity behaves, let's look more 
closely at £>( Pr(y) || l/N). Consider again X^, aMarkov 
chain of length L. And for convenience let the Xi be 
binary variables. The total number N of configurations 
for such a system is 2^. First, note that: 



-D(Pr(x^)||l/A^) = L-H[X^\ 
Since is extensive, we have: 

Z?(Pr(x^)||l/A^) cx L{l-h^) 



(13) 



(14) 



Thus, C consists of the product of two extensive quan- 
tities, H and D{ Pr(a:^) || 1/A^ )■ As a consequence, divid- 
ing C" by yields a quantity that is finite in the i — > oo 
limit; specifically. 



lim = h^{l-h^) 



(15) 



(See Fig. ^) This is indeed a function that vanishes in 
the ordered {h^ = 0) and disordered {h^ — 1) extremes. 
However, note that it is a function only of the system's 
entropy density. That is, the modified statistical com- 
plexity measure eq. ( [l^ ) is a function only of the system's 
randomness. The relation C oc L^h^^i — h^) strikes us 
as being "over-universal." For example, it's possible for a 
paramagnet and a system at its critical point, where the 
correlation length diverges, to have the same value of C". 
It does not seem particularly revealing that all systems 
with the same entropy density have the same statistical 
complexity. 

As a contrast to the C versus behavior shown in 
Fig. (^, consider Fig. (|^), where we show the behav- 
ior of a different measure of statistical complexity, the 
excess entropy E PJ^,p|, |rT| , p^|jl^ ] . The excess entropy of 
an infinite configuration may be expressed as the mutual 
information between two semi-infinite halves of the con- 
figuration. That is, E is the amount of spatial memory 
embedded in configurations. In ref. |p3| we show that 
E captures significant structural changes in the configu- 
rations of one-dimensional spin systems as external pa- 
rameters are varied. We have plotted the excess entropy 
for 10^ sets of parameter values for a ID Ising system 
with nearest-neighbor coupling in the presence of an ex- 
ternal field. Note that for all the Ising systems plotted 



in Fig. (H), Clmc = 0, since Clmc vanishes in the ther- 
modynamic limit. Comparing the two figures, it is clear 
that the excess entropy depends on the entropy density 
in a much more subtle way than C does. 
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FIG. 1. C' versus entropy density ft^. Note that C' is a 
function of h^. 

Comparing these two plots raises another important is- 
sue. Note that the excess entropy does not always equal 
zero for /i^ = 0, an apparent violation of the "boundary 
condition" requiring that a complexity measure vanish 
in the perfectly ordered limit. However, — corre- 
sponds to perfect asymptotic predictability, not perfect 
order. A process with a vanishing entropy rate indicates 
that it can be predicted without error — it says nothing, 
however, about how much effort or memory is required to 
perform this prediction. Thus, a zero value of the entropy 
density is too crude a measure of order. To see this, note 
that any periodic system has — 0. Yet all periodic 
systems aren't equally ordered: a configuration with pe- 
riod 1 is certainly more ordered than a configuration of 
period 1729 — which, for example, requires more memory 
to produce. In fact, at the period-doubling accumulation 
point of the logistic map, the symbolic dynamics produce 
periodic configurations of diverging periodicity. Hence, 
the excess entropy is infinite here, while the entropy rate 
remains zero . 

In contrast to C, which vanishes for any periodic sys- 
tem, the excess entropy E for a configuration of period P 
is log2 P. Only if the period is 1, indicating trivial order- 
ing and predictability, does the excess entropy vanish for 
a periodic process. Thus, the (/i^,E) = (0,0) points in 
Fig. (||) correspond to the system's ferromagnetic ground 
states of period 1 and the (/i^^jE) = (0, 1) points corre- 
spond to the antiferromagnetic ground states of period 2 
p3[ . Statistical complexity measures such as C or, for 
that matter ClmCj that are zero for all /i^ = configu- 
rations are very blunt implements with which to detect 
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structure. A measure of complexity should be able to 
distinguish between structures of different periodicities. 
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FIG. 2. Excess entropy E, a statistical complexity measure, 
versus entropy density for a spin- 1/2 one-dimensional Ising 
spin system: 10** {h^^, E) points. The system parameters were 
randomly chosen from the following intervals: J (coupling 
constant) £ [-3,3]; T (temperature) G [0.05,4.05]; and B 
(external field) € [0,3]. 

For maximal randomness {h^ — 1), the excess entropy 
E vanishes, as expected. At hfj, — I, corresponding to 
infinite temperature, the spins decouple and there is no 
information shared between them. But the excess en- 
tropy does more than satisfy the "boundary conditions" 
of vanishing for = and 1. The interpretation of E as 
the memory stored in spatial configurations holds for in- 
termediate values of /i^ as well. As a result, Fig. (||) lets 
us place an upper bound on the memory stored in spa- 
tial configurations for a spin-1/2 nearest neighbor Ising 
model: E < 1 — h^. This result, derived analytically 
in ref. |2^], applies to all one-step Markov chains over a 
binary alphabet. 

We conclude this section by noting that there is a grow- 
ing body of evidence indicating that, aside from the re- 
quirement of vanishing at the ordered and disordered ex- 
tremes, entropy and "complexity" (defined in a number 
of different ways to reflect "structure") are more or less 
independent. That is, there is a vastly wider range of 
complexity versus entropy relationships than indicated 
by eq. (|l5|) |l|,|l|,||. Fig. (|) is just one example of 
many possible statistical complexity-entropy density re- 
lationships. 



IV. CONCLUSION 

To summarize, we have shown that Clmc vanishes 
in the thermodynamic limit for finite-memory regular 



Markov chains. This class of systems includes, at a min- 
imum, all finite-range one-dimensional spin systems. We 
have also shown that Clmc is not an extensive variable. 
We have proposed modifying ClmCi replacing the "dis- 
equilibrium" of ref. [ p6t with the relative entropy with 
respect to the uniform distribution. This results in an 
quantity C" that grows appropriately in the thermody- 
namic limit, making it possible to define a meaningful 
statistical complexity density that, nonetheless, retains 
the spirit of Clmc- However, the product of this mod- 
ification is a quantity that is a trivial, "over-universal" 
function of the entropy density h^. In short, based on 
the above observations, it seems to us that Clmc and 
C ' may be of little use in measuring the complexity of a 
statistical mechanical system. 

We conclude by pointing out that the "boundary con- 
ditions" of vanishing in the extreme ordered and disor- 
dered limits do not uniquely specify a measure of com- 
plexity, an observation also made by Anteneodo and Plas- 
tino In fact, if this is the only feature one demands 
of a complexity measure, it's not clear to us why one 
would be motivated to devise a new statistic at all. 

Statistical mechanics, for example, is replete with func- 
tions that vanish in the high and low temperature limits. 
Since thermodynamic entropy, a measure of randomness, 
is a monotonic function of temperature, high (low) tem- 
perature corresponds to high (low) randomness. Exam- 
ples of quantities that vanish in these extremes (assuming 
there is not a critical point at T = 0) include the con- 
nected correlation functions, the correlation length, and 
magnetic susceptibility. These functions can be easily 
applied to any probability distribution describing a spa- 
tially or spatio-temporally extended collection of random 
variables. 

Information theory also comes equipped with a func- 
tion that vanishes for perfectly ordered and disordered 
systems: the mutual information / If two random 

variables Z and Y are independently distributed, then 
the mutual information between them, I[Z; Y] vanishes. 
At the other extreme, if Z and Y are both known with 
certainty— that is, H[Z] = H[Y] = 0— then I[Z; Y] also 
vanishes. For statistical dependencies between these ex- 
tremes, I[Z]Y] is positive and measures the amount of 
information shared between Y and Z. 

Given that there are many functions that vanish in the 
extreme ordered and disordered limits, it is clear that re- 
quiring this property does not sufficiently restrict a sta- 
tistical complexity measure. What other criteria can we 
use, then, to guide us as we attempt to detect structures 
and patterns in nature? To this question we offer two 
suggestions. 

First, it is helpful if the statistical complexity mea- 
sure has a clear interpretation: What exactly is the sta- 
tistical complexity measuring? The two English words 
"statistical complexity" do not sufficiently answer the 
question. Many of the statistical complexity measures 
proposed over the last decade or so do have clear inter- 
pretations. For example, the excess entropy may be in- 
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terpreted as the mutual information between two halves 
of an infinite configuration |^,|l^,|3| . Logical depth is the 
run time required by a universal Turing machine execut- 
ing the minimal program to reproduce a given pattern. 
These unambiguous interpretations help put these statis- 
tical complexity measures on a solid footing. 

Second, it is essential to consider the motivations be- 
hind a measure of statistical complexity: How is the mea- 
sure to be used? What questions might it help answer? 
It is possible to meaningfully assess its utility only if 
the motivations and goals for defining a complexity mea- 
sure are stated clearly. One set of issues is the detection 
and quantification of patterns produced by a process. It 
has been proposed that particular notions of structure 
adapted from computation theory capture the intrinsic 
"patterns" and information processing architecture em- 
bedded in a system. In this setting, one finds well-defined 
and easily interpreted measures of statistical complexity 
Ip^ . For other views on questions that a measure of sta- 
tistical complexity might help answer, see ref. [ p8[ . 

Finally, ref. mentions several different notions of 
complexity and notes that "there is not yet a consen- 
sus on a precise definition." "Complexity" has accepted 
meanings in other fields — meanings established prior to 
the recent attempts to use it as a label for structure in 
natural systems. For example, Kolmogorov-Chaitin com- 
plexity in algorithmic information theory means some- 
thing quite different from computational complexity in 
the analysis of algorithms. These, in turn, are each dif- 
ferent from the stochastic complexity used in model-order 
estimation in statistics . Though at a future date re- 
lationships may be found, at present all of these are dif- 
ferent from the notions of statistical complexity discussed 
here. 

Unfortunately, "complexity" has been used without 
qualification by so many authors, both scientific and non- 
scientific, that the term has been almost stripped of its 
meaning. Given this state of affairs, it is even more im- 
perative to state clearly why one is defining a measure of 
complexity and what it is intended to capture. 

We thank Melanie Mitchell and Karl Young for helpful 
comments on the manuscript. This work was partially 
supported at UC Berkeley by ONR grant N00014-95-1- 
0524 and at the Santa Fe Institute by ONR grant N00014- 
95-1-0975. 
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APPENDIX A: D VANISHES EXPONENTIALLY 
FAST FOR REGULAR MARKOV CHAINS 

We will show that D goes to zero exponentially fast 
with increasing system size for a Markov chain governed 
by a regular transition matrix Tab = Pi'(^|a), where < 
Tab < 1 and there exists a K < oo such that {T^)ab > 
for all a and b. Since the conditional probabilities are 
normalized, the matrix T is stochastic: X)b=i = 1- 
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The probability of a block of L consecutive variables 
taking on the values xi,X2, ■ ■ ■ is given by 



Pr(xi , X2 T • ■ • T Xl) — Px\TxiX2 ^X2X3 ' ' ' 



(Al) 



where p is the stationary distribution of a single variable, 
as given by the left eigenvector of T with eigenvalue 1. 
The eigenvector p is chosen so as to be normalized in 

probability, X^Li Pa = 1- 

Let T be a matrix whose components Tab are given 
by {Tab?. Note that f ^ T^. Similarly, let p be a 
vector whose components pa are given by (pa)^- Eq. (|^) 
indicates that we are interested in 



{x^} {x^} 



y ] PxiTxiX2Tx2X3 • • • Tx^_j^Xl ■ 



(A2) 



The sum runs over all configurations of length L. The 
effect of the sum is to multiply the matrices together; 



EE^^-(^ 



(A3) 



We shall show that in the L ^ oo limit the above ex- 
pression goes to zero exponentially fast. 

We begin by considering the vector V = pT'""^ and 
its Loo norm, \\V\\ = max{|Vi|, IF2I, ■ • ■}• Eq. @ may 
be rewritten in terms of V , 



sum of terms that are products of T's elements. Like- 
wise, each element of is a sum of terms that are 
products of T's elements. However, since Tab — {T'^)ab 
and < Tab < 1, it follows that each component of T^ 
is strictly less than the corresponding component of T^ . 
The product of stochastic matrices is itself a stochastic 
matrix, so 'Y^\=i{T^)ab = 1- Thus, since each compo- 
nent of T is less than the corresponding component of T, 
Y.Li{T'^)ab < 1. As a result, ||f || < 1. 

Now, by the consistency condition obe yed by all matrix 



norms, ||T^|| < |lr|| Rewriting eq. (|A7|), we have 



< \\T 



L-l| 



IT 



K{L-1)/K\\ 



(A8) 



In the L/K 00 limit — equivalent to the L —>■ 00 limit 
since K is finite — we then have by the consistency con- 
dition: 



WW < WT 



K\\L-l 



(A9) 



Since ||T^|| < 1 we see that \\V\\ is bounded above by a 
function that decreases exponentially in L. Hence \\V\\ 
itself also decreases exponentially in L. 



E p^(^')' = 

{x^} 



k 

E 

i=l 



V,. 



(A4) 



Since V is finite dimensional and all elements are 
nonnegative, if ||F|| goes to zero exponentially fast, 
^l^ij Pr(a;^)^ must also go to zero exponentially fast. 
To show the former we use some well-known properties 
of vector and matrix norms [ pO| . 

Consider the matrix norm induced by the Too vector 
norm: 



|f|| = maxlE^ah} . 

h 



(A5) 



Any matrix norm is compatible with its associated vector 
norm: 



= IIpT^-^I 



< 



\P\\ 



IT 



L-l 



(A6) 



Recall that the components of p are the square of the 
components of the stationary probability p of the Markov 
chain. Except for the trivial case in which there is only 
one symbol in our chain and T is a one-by-one matrix, 
the maximum component of p is less than one. Thus, 
< IIpII < 1 and we have 



< 



IPII 



IT 



L-l 



< IIT 



L-l 



(A7) 



By assumption, there exists a K such that < 
{T^)ab < 1 for all a and b. Each element of T^ is a 
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